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Abstract. The asymptotic behavior of stochastic gradient algorithms is studied. Relying 
on some results of differential geometry (Lojasiewicz gradient inequality), the almost sure point- 
convergence is demonstrated and relatively tight almost sure bounds on the convergence rate are 
derived. In sharp contrast to all existing result of this kind, the asymptotic results obtained here 
do not require the objective function (associated with the stochastic gradient search) to have an 
isolated minimum at which the Hessian of the objective function is strictly positive definite. Using 
the obtained results, the asymptotic behavior of recursive prediction error identification methods is 
analyzed. The convergence and convergence rate of supervised learning algorithms are also studied 
relying on these results. 
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1. Introduction. Stochastic optimization is at the core of many engineering, 
statistics and finance problems. A stochastic optimization problem can be described 
as the minimization (or maximization) of an objective function in a situation when 
only noise-corrupted observations of the function values are available. Such a problem 
can be solved efficiently by stochastic gradient search, a stochastic approximation ver- 
sion of the deterministic steepest descent method. Due to its excellent performance 
(generality, robustness, low complexity, easy implementation), stochastic gradient al- 
gorithms have gained a wide attention in the literature and have found a broad range 
of applications in diverse areas such as signal processing, system identification, au- 
tomatic control, machine learning, operations research, statistical inference, econo- 
metrics and finance (see e.g. [2], 0, [21, [10], [H], [IS], [l2|, [22], [24], [25], [26] and 
reference cited therein). 

Various asymptotic properties of stochastic gradient algorithms have been the 
subject of a number of papers and books (see see [1], [14], [16], [24], [26] and references 
cited therein). Among them, the almost sure convergence and the convergence rate 
have received the greatest attention, as these properties most precisely characterize 
the asymptotic behavior and efficiency of stochastic gradient search. Although the 
existing results provide a good insight into the convergence and convergence rate, they 
hold only under very restrictive conditions. More specifically, the existing results 
require the objective function (which the stochastic gradient search minimizes) to 
have an isolated minimum such that the Hessian of the objective function is strictly 
positve definite at the minimum and such that the attraction domain of the minimum 
is infinitely often visited by the algorithm iterates. However, in the case of complex, 
high-dimensional high-nonlinear algorithms, this is not only hard (if possible at all) 
to verify, but is likely not to be true. 

In this paper, the convergence and convergence rate of stochastic gradient search 
are analyzed when the objective function has multiple non-isolated minima (notice 



*Department of Mathematics, University of Bristol, University Walk, Bristol BS8 ITW, United 
Kingdom, {v.b.tadic@bristol.ac.uk). 



that at a non- isolated minimum, the Hessian can be semi-definite at best). Using 
some results of differential geometry (Lojasiewicz gradient inequality), the almost 
sure point-convergence is demonstrated and relatively tight almost sure bounds on 
the convergence rate are derived. The obtained results cover a wide class of com- 
plex stochastic gradient algorithms. We show how they can be used to analyze the 
asymptotic behavior of recursive prediction error algorithms for identification of lin- 
ear stochastic systems. We also show how the convergence and convergence rate of 
supervised learning in feedforward neural networks can be analyzed using the results 
obtained here. 

The paper is organized as follows. In Section [2l stochastic gradient algorithms 
with additive noise are considered and the main results of the paper are presented. 
Section[3]is devoted to stochastic gradient algorithms with Markovian dynamics. Sec- 
tions [Hand [H] contain examples of the results reported in Sections [5] and [31 In Section 
[H supervised learning algorithms for feedforward neural networks are studied, while 
recursive prediction error algorithms for identification of linear stochastic systems are 
analyzed in Section [5] Sections [6| - [9| contain the proofs of the results presented in 
Sections [2 -[1 

2. Main Results. In this section, the convergence and convergence rate of the 
following algorithm is analyzed: 



Here, / : K''" — > K is a differentiable function, while {a„}„>o is a sequence of positive 
real numbers. 9o is an R'^* -valued random variable defined on a probability space 
{n,T,P), while {^„}n>o is an M'^'^-valued stochastic process defined on the same 
probability space. To allow more generality, we assume that for each n > 0, ^„ is a 
random function of ■ • ■ i^n- In the area of stochastic optimization, recursion (|2.ip 
is known as a stochastic gradient search (or stochastic gradient algorithm), while 
function /(•) is referred to as an objective function. For further details see [22], [26] 
and references given therein. 

Throughout the paper, unless otherwise stated, the following notation is used. 
The Euclidean norm is denoted by || • ||, while d{-,-) stands for the distance induced 
by the Euclidean norm. S is the sets of stationary points of /(■), i.e.. 



0. 



'n+1 — 



{^f{On)+U, n>0. 



(2.1) 



S^iOeR'^" : Vfie) = 0}. 



Sequence {7n}n>o is defined by 70 = and 




i=0 



for n > 1. For t € (0, 00) and n > 0, a{n, t) is the integer defined as 



a{n, t) — max {fc > n : 7^ — 7„ < t} . 



Algorithm (|2.ip is analyzed under the following assumptions: 
Assumption 2.1. lim„^oo q;„ = and J2'^=o "n = «d- 
Assumption 2.2. There exists a real number r G (l,oo) such that 



^ — lim sup max 

n—too n<k<a{n,l) 
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w.p.l on {sup„>o ||6'„|| < oo}. 

Assumption 2.3. For any compact set Q C M'^" and any a G f{Q), there exist 
real numbers Sg^a G (0, 1], fJ^Q.a G (1,2], Mq^a G [l,oo) such that 

\f{9)^a\<MQ^a\\Vf{9)r^'^ (2.2) 

for all 9 G Q satisfying \ f{9) — a| < Sq a- 

Remark 2.1. As an immediate consequence of Assumvtion lKM we have that for 
each 9 G R'^" , there exist real numbers 5g e (0, 1], /ig G (1, 2], Mg G [1, oo) such that 

\f{9')~f{9)\<Me\\Vf{9')\r (2.3) 

for all 9' e M'*" satisfying \\9' — 9\\ < 6g. If 9 G S, ^g and Mg can be selected as 

where e is a small positive constant (since {0n}n>o converges to S, the values of fj,g, 
Mg for 9 ^ S are not relevant to the problems studied in the paper). Moreover, if 
Q C{9' € R'^o : \\9' - 9\\ < Sg} and a ^ f{9) e Q for some 9 G R'^" , tiq.a and Mq^^ 
can be selected as ^iq^a = f-e, Mq^a ~ Mg. 

Remark 2.2. In order for Assumvtion \2.3i to be true, it is quite sufficient that 
the assumption holds locally in an open vicinity of S , i.e., that there exists an open 
set V Z) S with the following property: For any compact set Q G V and any a G f{Q), 
there exit real numbers Sq^a G (0, 1], /iQ,a G (1, 2], Mq^a G [1, oo) such that \2. S\) holds 
for all 9 G Q satisfying \ f{9) — a\ < Sq^a (see Appendix for details). 

Assumption 12.11 correspond to the sequence {an}n>o and is widely used in the 
asymptotic analysis of stochastic gradient and stochastic approximation algorithms. 
Assumption 1 2 . 21 is a noise condition. In this or a similar form, it is involved in most of 
the results on the convergence and convergence rate of stochastic gradient search and 
stochastic approximation. It holds for algorithms with Markovian dynamics (see the 
next section). It is also satisfied when {^n}n>o is a martingale-difference sequence. 
Assumption 12.31 is related to the stability of the gradient flow d9/dt ~ —Vf{9), or 
more specifically, to the geometry of the set of stationary points S. In the area of 
differential geometry, relations (|2.2[) and (|2.3[) are known as the Lojasiewicz gradient 
inequality (see [TB] and [H] for details). They hold if /(•) is analytic or subanalytic 
in an open vicinity of S (see [H] for the proof; for the form of Lojasiewicz in- 
equality appearing in Assumption 12.31 and (|2.2p see [131 Theorem LI, page 775]; for 
the definition and properties of analytic and subanalytic functions, consult [5], [12] )■ 
Although analyticity and subanalyticity are fairly strong conditions, they hold for the 
objective functions of many stochastic gradient algorithms used in the areas of system 
identification, signal processing, machine learning, operations research and statistical 
inference. E.g., in this paper, we show that the objective functions associated with 
supervised learning and recursive prediction error identification are analytical (Sec- 
tions |4] and [5]). Moreover, in [28] (an extended version of this paper), we demonstrate 
the same property for temporal-difference learning algorithms. Furthermore, in [29j . 
we show analyticity for the objective functions associated with recursive identification 
methods for hidden Markov models. It is also worth mentioning that the objective 
functions associated with recursive algorithms for principal and independent compo- 
nent analysis (as well as with many other adaptive signal processing algorithms) are 
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usually polynomial or rational, and hence, analytic, too (see e.g., [S] and references 
cited therein). 

In order to state the main results of this section, we need further notation. For 
€ R'^o, Ce G [l,oo) stands for an upper bound of ||V/(-)|| on {9' e R"^" : \\d' - e\\ < 
Sg} and for a Lipschitz constant of V/(-) on the same set. Moreover, pg, qg and rg 
are real numbers defines as 

/l/(2-^(,), if <2 ■ < X ■ J \ T (0A\ 

rg = < pg = ^igmm{r,rg\, qg = mm{r,re} - 1 (2.4) 

OO, it fig — 2 



{Sg, fig are specified in Remark [2TT|) . 

Our main results on the convergence and convergence rate of the recursion ()2.ip 
are contained in the next two theorems. 

Theorem 2.1 (Convergence). Let Assumvtions \2.1\ - \2.3\ hold. Then, 6 = 
lim„^oo 0n exists and satisfies '^f{0) = w.p.l on {sup„>o \\9n\\ < oo}. 

Theorem 2.2 (Convergence Rate) . Let Assumvtions [KT\ ~ \2.3\ hold. Then, there 
exists a random variable K (which is a deterministic function ofp, Cg, Mg) such that 
1 < K < oo everywhere and such that the following is true: 

limsup7^||V/(0„)|P < X(^p(0)^ (2.5) 

limsup7P|/(0„) - f{e)\ < KiifiOf, (2.6) 

n — *oo 

]\TnsupjUOn-e\\<Kip{0 (2.7) 

n — >oo 

W.p.l on {supj^>o ll^nll < cjo}, where ji — p — p^, q = q^. f = and 



^(0 




The proofs are provided in Section [6l As an immediate consequence of the previ- 
ous theorems, we get the following corollaries: 

Corollary 2.1. Let Assumptions \2.1\ - \2.3\ hold. Then, the following is true: 

(i) \\Vfi0n)f=o{j-P), |/(0„)-/(0)| =o(7-P) and - e\\ ^ o{j-'^) w.p.l 
on {sup„>o \\9n\\ < cx)} n = 0,f > r}. 

(ii) ||V/(0„)f = 0{^-P), |/(0„) - f{9)\ - 0(7-^^) and - ^|| - 0(7-^) 
w.p.l on {sup„>o ||6'„|| < 00} n = 0,f > ry. 

(iii) ||V/(0„)f = 0(7-^) and IfiOn) - fm = oij-P) w.p.l on {sup„>o ||(?„|| < 
00}, where p — 'miii{l,r}. 

In the literature on stochastic and deterministic optimization, the asymptotic be- 
havior of gradient search is usually characterized by the convergence of sequences 
{V/(6l„)}„>o, {/(6'„)}„>o and {6'„}„>o (see e.g., [3], g], [13], [H] are references 
quoted therein). Similarly, the convergence rate can be described by the rates at 
which {V/((?„)}„>o, {f{dn)}n>o and {6'„}„>o tend to the sets of their limit points. 
In the case of algorithm (|2.ip , this kind of information is provided by Theorems 12.11 
12.21 and Corollarv l2.1l Theorem [571] claims that almost surely, algorithm (12. ip is point- 
convergence and does not exhibit limit cycles. Theorem 12 . 21 and CoroUarv 12.11 provide 
relatively tight upper bounds on the convergence rate of {V/(f?„)}„>o, {/(6'ri)}ri>o 
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and {0n}n>o- These bounds can be thought of as a combination of the convergence 
rate of the gradient flow d6/dt = —Vf{9) (characterized by Lojasiewicz exponent 
fie) and the rate of the noise averages X]i=n '^i'Ci (expressed through parameter r and 
sequence {7„}„>o)- BasicaUy, Theorem 12.21 and Corollarv l2.1l claim that the conver- 
gence rate of {|| V/(0„)p}„>o and {/(^„)}ri>o is the slower of the rates 0{j~^'^) (the 
rate of the gradient flow d9/dt — —Vf{9) sampled at instants {7„}n>o) and 0{'^~^^) 
(the rate of the noise averages maxfc>„ || X)i=„ Q^iCill'')- 

Apparently, the results of Theorems 12. li 12.21 and Corollary 12.11 are of a local 
nature: They hold only on the event where algorithm (|2.ip is stable (i.e., where 
sequence {On\n>o is bounded). Stating results on the convergence and convergence 
rate in such a local form is quite sensible due to the following reasons. The stability 
of stochastic gradient search is based on well-understood arguments which are rather 
different from the arguments used in the analysis of the convergence and convergence 
rate. Moreover and more importantly, it is straightforward to get a global version of 
the results provided in Theorems 12 . 1 ( [2?2l and Corollarv l2.1l bv combining the theorems 
with the methods used to verify or ensure the stability (e.g., with the results of [6] 
and 0). 

The point-convergence and convergence rate of stochastic gradient search (and 
stochastic approximation) have been the subject of a large number of papers and 
books (see see [1], [M], [16], [21], [26] and references cited therein). Although the 
existing results provide a good insight into the asymptotic behavior and efflciency 
of stochastic gradient algorithms, they are based on fairly restrictive assumptions: 
Literally, they all require the objective function /(•) to have an isolated minimum 0* 
(sometimes even to be strongly unimodal) such that Hessian V^/(0,) is strictly posi- 
tive deflnite and such that {0„},i>o visits the attraction domain of inflnitely many 
times w.p.l. Unfortunately, in the case of high-dimensional high- nonlinear stochastic 
gradient algorithms (such as online machine learning and recursive identification), 
it is hard (if not impossible at all) to show even the existence of an isolated mini- 
mum, let alone the deflniteness of V^/(6'*) and the infinitely often visits of {0n}n>o 
to the attraction domain of 0,. Moreover and more importantly, these requirements 
are unlikely to be satisfied by a high-dimensional high-nonlinear algorithm, as the 
objective function associated with such an algorithm prones to manifolds of (non- 
isolated) minima and (non-isolated) saddles each of which is a potential limit point of 
the algorithm iterates (e.g., a recursive prediction error identiflcation method exhibits 
this behavior when the candidate models are overparameterizcd or do not match the 
true system). Relying on the Lojasiewicz gradient inequality. Theorems 12.11 [2?2l and 
CoroUarv 12.11 overcome the described difficulties: Both theorems and their corollary 
allow the objective function /(•) to have multiple, non-isolated minima, impose no 
restriction on the values of V^/(-) (notice that V^/(-) cannot be strictly definite at a 
non- isolated minimum or maximum) and do not require (a priori) {0n}n>o to exhibit 
any particular behavior (i.e., to visit infinitely often the attraction domain of an iso- 
lated minimum). Moreover, they cover a broad class of complex stochastic gradient 
algorithms (see Sections 2] and [HJ see also [55], [22] )■ To the best or our knowledge, 
these are the only results on the convergence and convergence rate of stochastic search 
which enjoy such features. 

Regarding the results of Theorems 12 . 1 [ [2T2l and Corollarv l2.1l it is worth mention- 
ing that they are not just a combination of the Lojasiewicz inequality and the existing 
techniques for the asymptotic analysis of stochastic gradient search and stochastic 
approximation. On the contrary, the existing techniques seem to be completely in- 
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applicable to high-dimensional high-nonlinear stochastic gradient search. The reason 
comes out of the fact that these techniques crucially rely on the following Lyapunov 
function: 

w{9) = {9-0,fV^f{9,){0-9,), 

where 0, is an isolated minimmn such that V^/(0,) is strictly positive definite and 
such that the attraction domain of 0, is visited by {9n}n>n infinitely many times 
w.p.l. In this paper, we take an entirely different approach whose main steps can be 
summarized as follows: 

1. The convergence of {/(6'„)}„>o is demonstrated. 

2. A 'singular' Lyapunov function 



v{9) 



0, otherwise 



is constructed, where / = lim„^oo /(^n) andp is a suitable positive constant. Relying 
on this function, the convergence rate of {f{9n)}n>o and {V/(0„)}„>o is evaluated. 

3. Using the results derived at Step 2, the convergence rate of supfc>„ \\9k — 9n\\ 
is assessed. 

4. Applying the results of Step 3, the point-convergence of {9n}n>o is demon- 
strated. Then, refining the convergence rates derived at Steps 2 and 3, the results of 
Theorem 12.21 are obtained. 

At the core of our approach is the singular Lyapunov function «(•). Although 
subtle techniques are needed to handle such a function (see Section [6]), v{-) provides 
intuitively clear explanation of the results of Theorem 12.21 and Corollary 12.11 The 
explanation is based on the heuristic analysis of the following two cases0 

Case 2.1: liminf„_^oo 7n''(/(^n)) - /) = -oo and sup„>o ||6'„|| < oo, where /x is 
defined in Theorem 12.21 

In this case, there exists an increasing integer sequence {nk}k>o such that 
limfe^oo 7nfc(/(^nfc)) ^ /) = —oo. Owiug to Assumvtion \2.3[ we have 



|V/(0„)|| > - ./I/m) (2.8) 



for sufficiently large n, where M — Mg. Consequently, limfe^oo T^^^, || V/(0„j.) 
On the other side, Taylor formula yields 



ji-i 



n-1 

^fiOnJ - {in - 7nJ|lV/(0„jf - (V/(0„J)^ J] a.^. 



</(^«J - l|V/(0„J|| (7« -7«JI|V/(0„J|| 



n-1 



1 Throughout this analysis, we assume that Theorem 12.11 is true. We also assume 
supj.>„ II EiLii c^i^i II = 0(7n'^) when n oo, which is slightly stronger than what Assumption 
12.21 and Lemma 1 6 . 1 1 yield . 
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for n > Uk and sufficiently large k > 0. Since 7„ — > 1 for n > a{nk, 1) and since 



sup 

k>n 



(2.9) 



when n — > oo, we get f{On) < f{0n^) < f for n > a(nk, 1) and sufficiently large k > 0. 
However, this is not possible as lim„^oo /(^n) — /• Thus, Case 2.1 cannot occur. 

Case 2.2: limsup„_^ 7P(/(6'„) - /) = oo, p < /imin{r,f} and sup„>o ||6'„|| < 
oo, where /i, f are defined in Theorem 12.21 

Similarly as in the previous case, there exists an increasing integer sequence 
{'^fc}fc>o such that limfc_oo 7^. (/(^'nj - /) = oo- Then, 12. 8[) implies 



lim 7,';,(/(^nJ - /) > lini 7,^{^(/(^nJ - f) ^ ^ 
(notice thatp/fi < r). On the other side, Taylor formula and i2. 8\} yield 

■^rp n— 1 



(2.10) 



(v/(e„J) 



— -^a.(Vm)+6) 



/ 



7n - 7«fc 



2pA//2/A(/(6»„J - /)i+i/p-2//i 
|V/(0„J|| / (7«-7nJliV/((?„, 



(2.11) 



/or n > n^, and sufficiently large fc > 0. Since hm„^oo f{On) = / i'^'j^ smce 

1 + 1/p - 2//i = 1/p - A/^ > 0, 
relations ifO)) - iflTTT]) zmpZy 

f(0rO >t'(^nJ+iV(7„-7„J 

/or n > a{nk, 1), sufficiently large k > and N — l/(2pM^/'^). Consequently, 

f{0n)-f<{v{9,,,)+N{j^-j^jy' (2.12) 

/or n > a(ni;,l) and sufficiently large k > 0. However, this is impossible, as \2.12\) 
yields Umsup^^^o^ Inifi^n) ~ f) < c». Hence, Case 2.2 cannot happen. 

As none of Cases 2.1 and 2.2 is possible, we conclude that f{9n) converges to / 
at the rate 0(7"^). Since 7fe — 7„ > 1 for fc > a(n, 1) and since 

fc-i 

fiOk) - /(e,0 « - (7/c - 7n)ll V/(0„)ll' - i^fiOn)V Yl 



<-((7fc-7n)-l/2)||V/(a„)f + - 



i—n 
fc-1 
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for k > n and sufRciently large n > 0, we deduce 



\\^f{9nW <-2{f{ek)-f{en)) + 



fe-1 



for k > a{n, 1) and sufficiently large n > 0. As an immediate consequence, we 
have that ||V/(6'„)p converges to zero at the rate 0(7~^). The evaluation of the 
convergence rate of {0„}„>o is much more complicated (so that it cannot briefly be 
summarized here — the details are provided in Lemmas 16. 6[ I6.10p and is based on 
the following reasoning: 



< 



fc-1 



Cfc 

fc-1 



fc-1 



J2 



fc-1 



^(7fc-7n)I|V/(0„)|| + 

f 



< 



l|v/(^?„)| 

/(gfc) - f{On) 

l|v/(0„)|| 




fc-1 



where k > n and n > is sufficiently large. 

The heuristic analysis of Cases 2.1 and 2.2 carried out above indicates that the 
convergence rates of {f{dn)}n>o and {V/(6'„)}„>o reported in Theorem l2.2l are rather 
tight (if not optimal; for the discussion on the tightness of the rate of {On}n>0: see 
Remark l6.4p . The same conclusion is suggested by the following two special cases: 
Case 2.3: ^„ = for each n>0. 

Due to Assumvtion \2.3\ and \2.8\) . we have 



dt 



|V/(0(O)f < 



( fm)-! ' 



2/A 



for a solution 6{-) of d9/dt = —^7/(9) satisfying limt^oo f{0{t)) = f and 0([O, cxd)) C 
{9 eR^^o -.p-eW <Sg}. Consequently, 



- f ^ n('f-A/(2-A)^ = 



fi9{t)) -f^O{t 



■As {9„}n>o is asymptotically equivalent to 9{-) sampled at instances {7n}n>0; we get 
f{9n) — / = 0(7^'"'). The same result is implied by Theorem \2.1\ and Corollary 

\M 

Case 2.4: f{9) — 9^A9, where ^ is a strictly positive definite matrix. 
Recursion 12. 1]) reduces to a linear stochastic approximation algorithm in this 
case. For such an algorithm, the tightest bound on the convergence rate o/{/(0„)}„>o 
and {||V/(e'„)||2}„>o is 0(7-^^ if ^ > and 0(7-2'') if ^ = (see ^27]). The same 
rate is predicted by Theorem \2.2i and Corollary \2.1l 



3. Stochastic Gradient Algorithms with Markovian Dynamics. In order 
to illustrate the results of Section [2] and to set up a framework for the analysis carried 
out in Sections 2] and O we apply Theorems 12. 1[ 12.21 and Corollary 12.11 to stochastic 
gradient algorithms with Markovian dynamics. These algorithms are defined by the 
following difference equation: 

On+l = 0n - OLnF{en, Z^+l), U > 0. (3.1) 

In this recursion, F : R'^" x M''^ M'^" is a Borel-measurable function, while {a„}„>o 
is a sequence of positive real numbers. 6*0 is an R'^''-valued random variable defined 
on a probability space {n,J-,P), while {Zn}n>a is an R''^-valued stochastic process 
defined on the same probability space. {Zn}n>o is a Markov process controlled by 
{On} n>o^ there exists a family of transition probability kernels {IIg(-, •)}gg]gdo 
(defined on R'^^ ) such that 

P{Zn+i G -616*0, Zf), . . . , 9n, Zn) — Ilg^{Zn, B) 

w.p.l for any Borel-measurable set B C M"*^ and n > 0. In the context of stochastic 
gradient search, F{6n, Zn+i) is regarded to as an estimator of V/(0„). 

The algorithm (j3.ip is analyzed under the following assumptions. 

Assumption 3.1. lim„^ooan = 0, hmsup„^3^ |Q:n+i~'^n^l < '^'^'^S^o'^" ~ 
oo. There exists a real number r G (1, oo) such that X^^^o '^nln' < 

Assumption 3.2. There exist a dijferentiable function f : W'-" K and a 
Borel-measurable function F : R'^" x R"^' — > R'^" such that V/(-) is locally Lipschitz 
continuous and such that 

F{e, z) - v/(0) = F{e, z) - {j\F){e, z) 

for each 6* £ M''^ z £ R'*- , where (n#)(6',z) = / F{e,z')Tie{z,dz'). 

Assumption 3.3. For any compact set Q C M''" and s G (0,1), there exists a 
Borel-measurable function (pQ,s ■ R''* — > [1, oo) such that 

max{\\F{0,z)l\\Fie,z)imF)ie,z)\\}<cpQ,s{z), 

mF){e', z) - {UF){e",z)\\ < ^qAzW - o"r 

for all 9, 9', 9" eQ, z eR'^'. 

Assumption 3.4. Given a compact set Q C R"*" and s G (0, 1), 

SWp E {(pQ,.{Zn)I{rQ>n}\9o =9,Zo^ z) < OO 
n>0 

for all 9 eR'^\ z e R'^'", where tq = mi{n > : 6'„ ^ Q}. 

The main results on the convergence rate of recursion (j3.ip are contained in the 
next theorem. 

Theorem 3.1. Let Assumvtions \3. 1\ - l^^ hold, and suppose that f{-) (introduced 
in Assumvtion \3/0jl satisfies A ssumvtion 1 2. S\ Then, the following is true: 

(i) 9 = lim„^oo On exists and satisfies Vf{9) = w.p.l on {sup„>Q ||6'„|| < oo}. 

(ii) \\yf{9n)f^o{j-f), \f{9„)-f{9)\^o{^-f) and - = o(7,-«) w.p.l 
on {sup„>o ||6'„|| < oo} n {f > r}. 

(iii) ||V/(0„)f = 0(7-^), \f{9n) - fm = 0(7-^) and \\9n - 0\\ = 0{j-^) 
w.p.l on {sup„>o ||6'„|| < 00} n {f < r}. 
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(iv) ||V/(0„)||2 = o(7-P) and |/(0„) - f{e)\ - 0(7-^^) w.p.l on {sup„>o ||&„|| < 

00}. 

The proof is provided in Section [71 p, p, q and f are defined in Theorcin l2.2l and 
Corollary O 



Assumption 13.1 1 is related to the sequence {a„}„>o. It holds if q:„ — l/n" for 
n > 1, where a G (3/4, 1] is a constant (in that case, 7„ = 0(n^^") for n ^ cxo, 
while r can be any number satisfying < r < (a — l/2)/(l — a)). On the other 
side. Assumptions 13.21 - 13.41 correspond to the stochastic process {Zn}n>o and are 
quite standard for the asymptotic analysis of stochastic approximation algorithms 
with Markovian dynamics. Assumptions [3T2l - 13.41 have been introduced by Metivier 
and Priouret in [5D] (see also [TJ Part II]), and later generalized by Kushner and his co- 
workers (see fUj and references cited therein). However, neither the results of Metivier 
and Priouret, nor the results of Kushner and his co-workers provide any information 
on the point-convergence and convergence rate of stochastic gradient search in the 
case of multiple, non-isolated minima. 

Regarding Theorem 13.11 the following note is also in order. As already mentioned 
in the beginning of the section, the purpose of the theorem is illustrating the results 
of Theorem 12.11 and providing a framework for studying the examples presented in 
the next sections. Since these examples perfectly fit into the framework developed by 
Metivier and Priouret, more general assumptions and settings of [14] are not consid- 
ered here in order just to keep the exposition as concise as possible. 

4. Example 1: Supervised Learning. In this section, online algorithms for 
supervised learning in feedforward neural networks are analyzed using the results 
of Theorems 12.21 and 13.11 To avoid unnecessary technical details and complicated 
notation, only two-layer perceptrons are considered here. However, the obtained 
results can be extended to other feedforward neural networks such radial basis function 
networks. 

The input-output function of a two-layer perceptron can be defined as 



Here, ^ : R ^ R is a differentiable function, while AI and N are positive integers. 
ai,...,a7\/, 61,1, . . . , ^M.Af and xi,...,xis[ are real numbers, while 
9 — [ai • • • gm bi i ■ ■ ■ hM,NY 1 x = [xi • • ■ x^Y and dg = M{N -f 1). V(-) repre- 
sents the network activation function, x is the network input, while Gg{x) is the 
output. 9 is the vector of the network parameters to be tuned through the process of 
supervised learning. 

Let 7r(-, •) be a probability measure on x R, while 



for 9 € R*^". Then, the mean-square error based supervised learning in feedforward 
neural networks can be described as the minimization of /(•) in a situation when only 
samples from 7r(-, •) are available. For more details on neural networks and supervised 
learning, see e.g., [10], [11] and references cited therein. 

Function /(•) is usually minimized by the following stochastic gradient algorithm: 





9. 



'n+1 — 



(r„-Ge„(X„))i/9„(X„), n>0. 



(4.1) 
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In this recursion, {Q!„}n>o is a- sequence of positive real numbers, while Hg{-) = 
\/sGg{-). 9o is an R'*" -valued random variable defined on a probability space {ft, T, P), 
while {{Xn,Yn)}n>o is an x R- valued stochastic process defined on the same 
probability space. In the context of supervised learning, {(X„,y„)}„>o is regarded 
to as a training sequence. 

The asymptotic behavior of algorithm (|4.ip is analyzed under the following as- 
sumptions: 

Assumption 4.1. 'tp[-) is real-analytic. Moreover, ip{-) has a (complex-valued) 
continuation ?/;(•) with the following properties: 

(i) i^{z) maps z G C into C (C denotes the set of complex numbers). 

(ii) ip{x) — il}{x) for all x G R. 

(iii) There exist real numbers e £ (0, 1), K 6 [1, oo) such that is analytic on 
= {z G C : (i(z,R) < e\, and such that 

max{|V;(z)|, \^'{z)\} < K 

for all z Cz Ve (4'' {') is the first derivative of ^{■)). 

Assumption 4.2. {(A^n, i^i)}n>o o-f^ i.i.d. random variables distributed accord- 
ing the probability measure 7r(-,-). There exists a real number L £ [l,C)o) such that 
||Ao|| < L and \Y^\ < L w.p.l. 

Our main results on the properties of objective function /(•) and algorithm (|4.ip 
are contained in the next two theorems. 

Theorem 4.1. Let Assumptions pTT] and hold. Then, /(•) is analytic on 
entire M.'^" , i.e., it satisfies Assumption \2.3[ 

Theorem 4.2. Let Assumptions \ 3. 11 pT7| and \4T^ hold. Then, the following is 
true: 

(i) = lim„^oo (^n exists and satisfies Vf{0) = w.p.l on {sup„>g \\0n\\ < oo}. 

(ii) ||V/(0„)f = 0(7-^), \f{e„)-f{e)\=o{j-f) and \\9„-9\\ =0(7-^) w.p.l 
on {sup„>o \\e„\\ < oo}n{f > r}. 

(iii) ||V/(0„)f = 0(7-^), \f{9n) - fm = 0{j-P) and ||0„ - ^|| = 0(7-^) 
w.p.l on {sup„>o ||6'„|| < 00} n {f < r}. 

(iv) ||V/(^^„)f = o(7-f) and |/((?„) - /(^)| = 0(7-'') w.p.l on {sup„>o ||0„|| < 

00}. 

The proofs are provided in Section [8l p, p, q and f are defined in Theorem 12.21 
and Corollary 12. II 

Assumption 14. 1 1 is related to the network activation function. It holds when ?/>(•) 
is a logistic functior{^ or a standard Gaussian densitjf^, which are the most popular 
activation functions in feedforward neural networks. Assumption 14.21 corresponds to 
the training sequence {(A„, y„)}„>o, and is common for the analysis of supervised 
learning. 

^ Complex- valued logistic function can be defined as h(z) = (1 -(- exp(— 2:))"^ for z £ C. Since 

|1 + exp(-z)|2 = 1 + oxp(-2Rc(2)) + 2 cxp(-Re(z)) cos(Im(z)) > 1 + exp(-2Re(2)) 

when |Im(2)| < tt/2, h{-) is analytical on {z G C : d(z,M.) < tt/2}. Due to the same reason, 
ma.x{\h{z)\,\h'{z)\} < 1 on {2 G C : d{z,R) < n/2}. 

^ Complex- valued standard Gaussian density is defined by h(z) = (2n)~^^^ exp(— 2-^/2) for 2 G C. 
It is analytical on entire C. As 

(1 + |2|)exp(-22/2) < (1 + |Re(z)| 4- |Im(2)|) exp(-Re2(2)/2 -|- Im2(z)/2) < 3e 

when |Im(2)| < 1, we have max{|/i(2)|, |/i'(2)|} < 3e on {2 G C : d{z,M.) < 1}. 
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The asymptotic properties of supervised learning algorithms have been studied in 
a large number of papers (see [TUj , [H] and references cited therein) . Unfortunately, 
the available literature does not provide any information on the point-convergence 
and convergence rate which can be verified for feedforward neural networks with 
nonlinear activation functions. The main difficulty comes out of the fact that the 
existing results on the convergence and convergence rate of stochastic gradient search 
require the objective function /(•) to have an isolated minimum 0* such that V^/(0*) 
is strictly positive definite and such that {On}n>o visits the attraction domain of 6^ 
infinitely many times w.p.l. Since /(•) is highly nonlinear, these requirements are not 
only hard (if possible at all) to show, but are rather likely not to hold. Theorem 14.21 
does not invoke any of such requirements and covers some of the most widely used 
feedforward neural networks. 

5. Example 2: Identification of Linear Stochastic Dynamical Systems. 

In this section, the general results presented in Sections [2] and [3] are applied to the 
asymptotic analysis of recursive prediction error algorithms for identification of linear 
stochastic systems. To avoid unnecessary technical details and complicated notation, 
only the identification of one dimensional ARMA models is considered here. However, 
it is straightforward to generalize the obtained results to any linear stochastic system. 

To state the problem of the recursive prediction error identification in ARMA 
models, we use the following notation. M and N are positive integers. For oi, . . . , om S 
R and 6i, . . . , ^at £ R, let 

M N 
k=l k=l 

where 9 = [ai ■ ■ ■ aM bi ■ ■ ■ bpf]'^ and z ^ C {C denotes the set of complex numbers). 
Moreover, let dg = M + N and 

e = {0 e R'^" : Bg{z) = ^ |z| > 1}. 

{Yn}n>o is a real- valued signal generated by the actual system (i.e., by the system 
being identified). For 9 eQ, {Y^}n>o is the output of the ARMA model 

Agiq)Yf^ ^ Be{q)W„, n>0, (5.f) 

where {VKn}>o is a real-valued white noise and q^^ is the backward time-shift opera- 
tor. {Sn}n>o is the process generated by the recursion 

Bgiq)ei = A0{q)Y„, n>0, (5.2) 

while F,f = Yn — efj for n > 0. Y^ represents a mean-square optimal estimate of Yn 
given Yq, . . . , Yn-i (which the model (|5.fp can provide; for details see e.g., [TB], [H]). 
Consequently, can be interpreted as the estimation error of Y^. 

The parametric identification in ARMA models can be stated as follows: Given 
a realization of {y„}„>o, estimate the values of 9 for which the model (|5.f p provides 
the best approximation to the signal {Yn\n>o- If the identification is based on the 
prediction error principle, this estimation problem reduces to the minimization of the 
asymptotic mean-square prediction error 

f{9)^ \ hm E{{elf) 
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over Q. As the asymptotic value of the second moment of e^^ is rarely available ana- 
lytically, /(•) is minimized by a stochastic gradient (or stochastic Newton) algorithm. 
Such an algorithm is defined by the following difference equations: 

0n = [Yn ■ ■ ■ Yn-M+1 £„ • ■ • En-A'+l]'^, (5.3) 
En+l = Y,i+1 - (1)^0,1, (5.4) 
t/jn+l = (t>n -[ipn-- ■ 1p„^N+l]'^D 0^, (5.5) 

On+1 =dn + a„-0n+ie«+i, n > 0. (5.6) 

In this recursion, {a„}„>o denotes a sequence of positive reals. £> is an iV x {M + N) 
matrix whose entries are dij = liij = M + i, l<i<N and dij = otherwise. 
{Yn}n>-M is a real- valued stochastic process defined on a probabihty space {il, T , P), 
while 6*0 G 9, £o, . . . , Ex-n G M and -00, ... , tpi-N S K'*" are random variables defined 
on the same probability space. 6*0, £oj • ■ • , Ei-tv, V'Oj ■ • ■ , V'l-w represent the initial 
conditions of the algorithm (|5.3|) - (|5.6|) . 

In the literature on system identification, recursion (|5.3p - (|5.6p is known as 
the recursive prediction error algorithm for ARMA models (for more details see |16) . 
|17j and references cited therein). It usually involves a projection (or truncation) 
device which ensures that estimates {9n}n>o remain in Q. However, in order to avoid 
unnecessary technical details and to keep the exposition as concise as possible, this 
aspect of algorithm (|5.3p - (|5.6p is not discussed here. Instead, similarly as in [TS] - 
[17], we state our asymptotic results (Theorem 15. 2p in a local form. 

Algorithm (j5.3p - (|5.6p is analyzed under the following assumptions: 
Assumption 5.1. There exist a positive integer L, a matrix A G R^^^, a vec- 
tor b G and M.'" -valued stochastic processes {Xn}n>~M , {Vn}n>-M (defined on 
{Q, P) ) such that the following holds: 

(i) Xn+i = AXn + Vn and r„ = 6^X„ for n > -M. 

(ii) The eigenvalues of A lie m {z S C : \z\ < 1}. 

(iii) {Vn}n>-M are i.i.d. and independent of9o, Xi^m, ^o, • • ■ ,£i-n, V'Oj ■ ■ • , V'l-w- 

(iv) <oo. 

Assumption 5.2. For any compact set Q C Q, 

SUpi? ((4 + Un\\y{rQ>n}) < (5.7) 
n>Q 

where tq = mf{n > : 6'„ ^ Q}. 

Our main result on the analyticity of /(•) is contained in the next theorem. 
Theorem 5.1. Suppose that {Yn}n>o is a weakly stationary process such that 

Y,\Goy{Yo,Yn)\<^. 

Then, /(•) is analytic on entire Q, i.e., the following is true: For any compact set 
Q C Q and any a £ f{Q), there exist real numbers Sq^a G (0,1], MQ,a G (lj2], 
Mq a G [1, oo) such that \2. S^) is satisfied for all 9 E Q fulfilling \ f{9) — a| < Sq a- 
Let A is the event defined by 

A |sup||6i„|l < oo, inf d{9n,dQ) > ol . 

[n>0 ">0 J 
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Then, our main result on the convergence and convergence rate of algorithm (j5.3p - 
(|5.6p reads as follows. 

Theorem 5.2. Let Assumvtions \3.1[ [577] and [57B hold. Then, the following is 
true: 

(i) 9 = liniji^oo 0n exists and satisfies \7 f{9) = w.p.l on A. 

(ii) J|V/(0„)f =o(7^P), |/(0„)-/WI -o(7-P) and - ^|| = 0(7-^) w.p.l 
on An {f > r}. 

(hi) J|V/(0„)f = 0(7,7*^), = 0{j-P) and ||0„ - ^11 = 0(7-^) 

on A n {f < r}. 

(iv) ||V/(0„)f = 0(7^*') and |/(0„) - /(^)| = 0(7"^) on A. 

The proofs are provided in Section [51 p, p, q and f are defined in Theorem 12.21 
and Corollarv l2.1l 

Assumption 15.11 corresponds to the signal {Yn}n>o- It is quite common for the 
asymptotic analysis of recursive identification algorithm (see e.g., !l, Part I]) and 
cover all stable linear Markov models. Assumption 15.21 is related to the stability 
of subrecursion (|5.3p - (|5.5[) and its output {e„}>o, {ipn}n>o- In this or a similar 
form. Assumption 15.21 is involved in most of the asymptotic results on the recursive 
prediction error identification algorithms. E.g., [16l Theorems 4.1 - 4.3] (which are 
probably the most general results of this kind) require sequence {{en,fpn)} n>o to visit 
a fixed compact set infinitely often w.p.l on event A. When {Yn}n>o is generated by 
a stable linear Markov system, such a requirement is practically equivalent to l|5.7p . 

Various aspects of recursive prediction error identification in linear stochastic 
systems have been the subject of numerous papers and books (see |16j . |17j and refer- 
ences cited therein). Despite providing a deep insight into the asymptotic behavior of 
recursive prediction error identification algorithms, the available results do not offer 
information about the point-convergence and convergence rate which can be verified 
for models of a moderate or high order (e.g., M and N are three or above). The main 
difficulty is the same as in the case of supervised learning. The existing results on the 
convergence and convergence rate of stochastic gradient search require /(•) to have 
an isolated minimum 0^ such that V^/(0,) is strictly positive definite and such that 
{Sn}n>o visits the attraction domain of 9^ infinitely many times w.p.l. Unfortunately, 
/(•) is so complex (even for relatively small M and N) that these requirements are 
not only impossible to verify, but are likely not to be true. Apparently, Theorem 15.21 
relies on none of them. 

Regarding Theorems l5.1l and l5.2|, it should be mentioned that these results can be 
generalized in several ways. E.g., it is straightforward to extend them to practically 
any stable multiple- input, multiple-output linear system. Moreover, it is possible to 
show that the results also hold for signals {i^Ti}r!>o satisfying mixing conditions of the 
type [m Condition SI, p. 169]. 

6. Proof of Theorems 12.11 and I2.2i In this section, the following notation is 
used. Let A be the event 

A = i sup ||6l„|| < 00 I . 

Ln>0 J 

For £ 6 (0, 00), let 
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For < n < fc, let Cn,n = Cn = C,„ = 0, 0„,„ = </>' „ = (f>'' = and 



fc-i 



fc-1 



i—n 

Cn,k — Cn,k ~^ Cn,k^ 

<t>'k = (v/(0„))^c«,fc, 



1 



{Vf{9n + s{6k - 0n)) - yfiOn)r {Ok - On)ds, 

Then, it is straightforward to show 

fc-i 

i—7l 

= -{lk-ln)S/f{dn)~Cn,k, (6.1) 
f{0k) - f{On) - -ilk - 7n)l|V/(0„)f - 0„,fc (6.2) 

for < n < fc. 

In this section, besides the quantities introduced in the previous paragraph, we 
rely on the following notation. For a compact set Q C M.'^" , Cq stands for an upper 
bound of ||V/(-)|| on Q and for a Lipschitz constant of V/(-) on the same set. A is 
the set of accumulation points of {0„}n>0: while 

/ = liminf/(0„). 

B and Q are random sets defined by 

B^\J {e' e R'^" : \\9' - e\\ < S9/2} , Q - cl(B) 

on event A, and by 

B = A, Q = A 

outside A {de is specified in Remark I2.1|l . Overriding the definition of /t, p, f, in 
Theorem 12.21 we specify random quantities S, ft, p, f, C, M as 

, /l/(2-A), if/i<2 „ ~ • r n 
I 00, it fl = 2 

on A ((5g_a, fJ'Q,a, a are specified in Assumption [231) > and as 

6 = 1, A = 2, (7 = 1, M = 1, f = oo, p = 2r 
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outside A (later, when Theorem 12.11 is proved, it will be clear that /i, p, f specified 
here coincide with /t, p, f defined in Theorem 12. 2p . Functions w(-), w(-) are defined by 



m = f{0) - /, v{e) = 



lO, otherwise 



for 61 G R'^". 

Remark 6.1. On event A, Q is compact and satisfies A C intQ. Thus, d, p, f, 
C, M, v{-) are well-defined on A (what happens with these quantities outside A does 
not affect the results presented in this section). On the other side, Assumption \2.3\ 
implies 



\f{e)-f\<M\\yfmV 



(6.3) 



on A for all 6 & Q satisfying \ f{6) — /| < 5. 

Remark 6.2. Regarding the notation, the following note is also in order: 'symbol 
is used for a locally defined quantity, i.e., for a quantity whose definition holds only 
in the proof where such a quantity appears. 

Lemma 6.1. Let Assumvtions [KT\ and [27B hold. Then, there exists an event 
No eJ^ such that P{No) = and 

limsup7; max ||C^ fe|| < ^ < co 

n— >oo n<k<a(n,l) 

onA\No. 

Proof. It is straightforward to verify 



k-l 



c,k = Y.(^r - 7-/1) I E "^-^K^- 1 + E "'^e^ 

i—n 

for < n < fc. Consequently, 



k-l 



iiCfcii< 7r + E(7r-7r/i) ^ax 

3 

E 



E "»7l'C» 



=7„ max 

n<j<a(n,l) 



for < n < fc < a(n, 1). Thus, 

7nllC,fcll < ^max 

n<.j<a(n,l) 



E "»7i''C» 



for < n < fc < a(n, 1). Then, the lemma's assertion directly follows from Assumption 
[O □ 

Lemma 6.2. Suppose that Assumptions \2. l\ - \2.3\ hold. Then, there exist random 
quantities Ci, t (which are deterministic functions of C ) and for any real number 
e G (0, cxd), there exists a non-negative integer-valued random quantity ri.^ such that 
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the following is true: 1 < Ci < oo, < t < 1, < ri ^ < oo everywhere and 

max J|0,-0„|| <Ci(||v/(0„)||+7r(e + £)), (6-4) 

n<.k<.a(n.t) 

max ,(/(e,)-/(0„))<C'i(7-n|V/(e„)||(^ + £)+7,;'^(e + e)2), (6.5) 

n<.k<.a(7i.t) 

fiOainA) - fiOn) + t|| V/(0„)f /2 < C, (t^'' || V/ (0„) ||(^ + s) + j'^^' + sf) 

(6.6) 

2 (/(^a(n,t)) - fiOn)) + ^ || V/(0„) II V2 + || V/(0„ ) II II ^a(„,t) " II 

< Ci (7„-1l V/(0„)ll(^ + e)+ + ef) (6.7) 

on A \ Nq for n > ti^^. 

Proof Let Ci = 2(7exp(C'), = 2(7Ci, = 2CC^ + C2, (74 = C2 + 2C3, while 
Ci = (74, t = 1/(4(74). Moreover, let e G (0, cxo) be an arbitrary real number. Then, 
owing to Lemma HH] and the fact that 7^^^ t) ~ 7n = i + 0(a^(„ ^^^) for n — > 00, it 
is possible to construct a non-negative integer- valued random quantity ti^^ such that 
< ri,£ < 00 everywhere and such that On G Q, 

la(n,i) - In > 2^/3, (6.8) 

max JICJI <7r(^ + £) (6.9) 

on A \ A'o for n > ri^g. 

Let w be an arbitrary sample from A \ Nq (notice that all formulas which follow 
in the proof correspond to this sample). Since 0^ £ Q for n > ri^, (|6.ip . (|6.9p yield 

\\yf{Ok)\\ <iiv/(0„)ll + liv/(0fe) - v/(0„)ii 

<||V/(0„)ll+(?||0fe-0„|| 

<||vM0ll + (?5]a.||vm)|| + c'||C,fcll 

A;-l 

<||V/(0„)|| + (77-'- (e + e) + (7^ a,||V/(0OII 

z— n 

for ri^£ < n < k < a{n^ 1). Then, Bcllman-Gronwall inequality implies 

\Nf{Ok)\\ < (||V/(0„)|| + (77-'-(e + e)) exp ((7(7fe - 7n)) 

<(7exp((7)(||VmO||-l-7-''(e + £)) 

for Ti_£ < n < k < a{n, 1) (notice that 7^; — 7„ < ^a{n,i) ^ In ^ ^ when n < k < 
a(n, 1)). Consequently, (|6.9p gives 

ii^fc-eji <E"^ii^/(^»)ii + iiCfeii 

z— n 

<(7exp((7) (||V/((?„)|| + 7-'-(e + e)) (7^ - 7n) + 7,7''(e + e) 
<Ci ((7fc -7n)l|V/(0„)|| +7r(^ + ^)) (6.10) 
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for Ti^g < n < k < a{n, 1). Therefore, (|6.9p yields 



fc-i 



||Cn,fe|| <||CUII+(?^a.||0.-&„I| 

i—n 

<lu''{^ + + CC^ ((7fe - 7n) II V/(f?„)|| + 7^''(e + e)) (7^ - 7™) 
<C2 ((7fc - 7n)' II V/(0„)|| + 7n + £)) (6.11) 
for Ti j, < n < k < a{n, 1). Thus, 

\(l>nM <\\^f{On)\\KnM\+CPk-OJ^ 

<C2 ((7fc - 7n)'l|V/(0„)f + 7,7n|V/((?„)||(e + £)) 

+ CCl ((7fe -7n)l|V/(^?„)|| +7,r(e + e))' 

<C3 ((7fc - 7n)'l|V/(0„)f + 7,7n|V/(0„)ll(e + e)+ I'^'^i + sf) (6.12) 

for ri^£ < n < k < a{n, 1). 

Owing to dO]), ((6TT2I) . we have 

/(^?fc) - f{On) < - hk - 7n)||V/((?„)f + |0„,fc| 

< - (1 - Csijk - In)) ilk - 7n)l|V/(e„)f 

+ (7„'1| V/(0„)||(? + e)+ 7-2'-(e + e)') (6.13) 
for ri^£ < n < k < a{n, 1). Since 

C3(7fc - 7n) < (?4(7fc - In) < C^{la(n,i) " In) < Cj < 1/4 (6.14) 

for < n < fc < a{n, i), (pT^ yields 

f{Ok) - fiOn) < - 3(7fc - 7n)l|V/(0„)f /4 

+ Cs (7,711 V/(0„)||(? + e)+ 7-2^ (e + £)') (6.15) 

for n^e < n < k < a{n,i). As an immediate consequence of (|6.8p . (|6.10p . (I6.15P we 
get that (|6.4p - (|6.6p hold for n > ri^e (notice that 7fe — 7n < 1 for n < fc < a(n, 1)). 
Due to (16.11). we have 



(7fc -7n)l|V/(f?„)f -||V/(f?„)||||(7fc - 7n)V/(0„)|| 

= \\Vf{9^a)\\\\0k-0„+Cn,k\\ 

for < n < fc. Combining this with (|6.2p . (|6.12p and the first part of (|6.1ip . we get 

2 if (Ok) - f{On)) = - ||V/(0„)||||efe - dn + CnM\ " ilk - 7n)l|V/(0„)f " 20„,fc 

< - l|v/(0„)||||0fe -0„|| - {ik-in)\\s/f{e„)f 

+ \\yf{0n)\\\KnM\+2\^n,k\ 

< - ||V/(0„)||||0fc - 0„|| - ilk - 7n)l|V/(0„)f 
+ C'4(7fc-7n)'l|V/(0„)f 

+ (74 (7,7l|V/(0„)||(e + e)+ i-^^ii + £)2) 

= - l|V/(0„)||||0fe - 0„|| - (l - (74(7^ - 7n)) (7fe - 7n)l|V/(0„ 

+ (74 (77l|V/(^?„)||(e + e)+ 77"" (e + £)') 
18 
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for ri_e < n < k < a{n, 1). Consequently, (|6.14p yields 

2 if (Ok) - f{9n)) < - \\^f{0n)\\\\0k - 0n\\ - Hlk ^ 7n) II V/(0„) 1 1 V4 

+ (7^1|V/(0„)||(e + e)+ 7-^'-(e + s)') 

for Ti^e < n < k < a{n,t). Then, (|6.8p implies that (|6.7p is true for n > ti^^. □ 

Lemma 6.3. Suppose that Assumvtions [2A\ - \2.3\ hold. Then, lim„^oo V/(0„) = 
on A\No. 

Proof. The lemma's assertion is proved by contradiction. We assume that 
limsup„^o2 ll^/(^n)ll > for some sample w G A\Nq (notice that all formulas which 
follow in the proof correspond to this sample). Then, there exists a S (0, oo) and an in- 
creasing sequence {/fe}fc>o (both depending on w) such that liminffe^oo l|V/(f?;j^)|| > a. 
Since liminffe^oo /(^^(ij^ i)) — /' Lemma [^?^ (inequality (|6.6p ) gives 

/-liminf/(0ij <limsup(/(0,(,^_,-)) -/(0,J) 
<-(£/2)liminf||V/(0ijf 

k — 'OO 

< - a^i/2. 

Therefore, liminffc_oo f{Sik) ^ f + at'^/2. Consequently, there exist 6, c £ R (depend- 
ing on uj) such that f<b<c<f + aP/2, b < f + 6 and limsup„^oo /(^n) > c. 
Thus, there exist sequences {TOfc}fc>Oi {"-fc}fe>o (depending on uj) with the following 
properties: ruk < < ruk+i, /(^mj < h, f{9nj > c and 

max f{9n) > b (6.16) 

for k>0. Then, Lemma [6?2] (inequality (|6.5p ) implies 
limsup(/(0„,+i)-/((?™J) <0, 
limsup max J/(0„) - /(6'm J) < 

fc— too mk<n<a{mk,i) 

Since 

> /(^™J = /(e™.+i) - (/(^^™.+i) - /(e™J) > b - (/(0™,+i) - /((?™J) 

for /c > 0, (EIll) yields linifc^oo /(^?mj = 6. As /(0nJ-/(^?mJ > c-fe for fc > 0, 
implies a{mk,t) < Uk for all, but infinitely many k (otherwise, liminffc^oo(/(0„^) — 
f{dmk)) < would follow from (|6.18p ). Consequently, liminffe^oo /(^a(mfc,t)) — ^ 
(due to (|6.16p ). while Lemma [621 (inequality (|6.6p ) gives 

< limsup/(6'„(„^ ()) - =limsup(/(6'„(„^ £)) - /(6l„J) 

k^oc k^oc 

<-(£/2)liminf||V/(^?,„Jf. 

k — too 

Therefore, limt^oo I|V/(6'to^)|| = 0. Moreover, there exists feg > (depending on uj) 
such that Orrik G Q and f{9mk) > (/ + b)/2 for k > ko (notice that linife^oo fi^mk) = 
b > {f + b)/2). Consequently, 6l„, e g and < {h-f)/2 < /(6l„J-/ < 5 for fc > fco 
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(6.17) 
(6.18) 



(notice that f{Omk) < b < f + 6 ioi k > 0). Then, owing to l|6.3p (i.e., to Assumption 
13. 3p . we have 

< (5 - /)/2 < fiOm,) - f < Af||V/(0„jr 

for k > ko. However, this directly contradicts the fact hnife^oo II V/(6'mj.)|| = 0. 
Hence, hni„_,oo V/(6'„) on A \ iVo. □ 

Lemma 6.4. Suppose that Assumvtions lKl] - \2.S!\ hold. Then, hni„^oo /(f^n) = / 
onA\No. 

Proof. We use contradiction to prove the lemma's assertion: Suppose that / < 
limsup„^3Q f{9n) for some sample to € A\No (notice that all formulas which follow 
in the proof correspond to this sample). Then, there exists a G M (depending on u) 
such that f < a < f + 6 and limsup„^^ fi&n) > a- Thus, there exists an increasing 
sequence {nk}k>o (depending on bj) such that f{On^) < a and /(^n^+i) > a for fc > 0. 
On the other side. Lemma l6T2l f inequality (|6.5p ) implies 

limsup(/(^^„,+l) - /(&„J) < 0. (6.19) 

k—^oo 

Since 

a > fiOnJ = f{en, + l) - {.f{0n,+l) ~ /(^nj) > a ~ (/(^„, + l) - /(0„J) 

for fc > 0, (I6.19P yields limfc^oo /(^nt) — a. Moreover, there exists ko > (de- 
pending on Lu) such that G Q and /{On^) > (/ + a)/2 for k > ko (notice that 
limfe^oo .f{On, ) = a > (/ + a)/2). Thus, G Q and < (a - f)/2 < /(0„, ) - / < <5 
for k > ko (notice that /(6'„J < a < / + 5 for fc > 0). Then, due to ((O)) (i.e., to 
Assumption 12. 3p . we have 

< (a - f)/2 < /((?„J - / < M\\\7f{e„,)r 

for fc > fco. However, this directly contradicts the fact lim„^oo V/(0„) = 0. Hence, 
lim„^oo/(e„) = /on A\iVo. □ 

Lemma 6.5. Suppose that Assumptions \2. 1\ ~ \2.3\ hold. Then, there exist random 
quantities C2, C3 (which are deterministic functions of p, C, M) and for any real 
number e G (0, cxa), there exists a non-negative integer-valued random quantity T2^e 
such that the following is true: 1 < C2, C3 < 00, < T2,e < 00 everywhere and 

u{en) + t||V/(0„)f /4) /a„.. < 0, (6.20) 

u{e„) + (i/Cs) u(0„)) /i3„,, < 0, (6.21) 

viOn) - ii/Cs)i^,{Or^/^) Ic^,, > (6.22) 
on A\ No for n > T2,e, where 

= {ifMdn)] > C^iVeiOf} U {7,^||V/(0„)f > C^i^siOf}: 

i?„,e = {lyiOn) > C2{vA0r} n {A = 2}, 

C„., = {Yn<Or.) > C2iiPe{0r} {^/(0„(„,£)) > o} H {A < 2} . 
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Remark 6.3. Inequalities h6.20\) - h6. 22]) can he represented in the following 
equivalent form: Relations 

{ltHOn)\ > C2iVe{0r V 7,tl|V/(0„)f > C^iMOf) A n > r2,, 

=^ u(0,(„^,-)) < uiOn) - t||V/(0„)||V4, (6.23) 

iMOn) > C2((^e(?))^ A A - 2 A n > T2,e 

=^ <(^ai.n,i)) < (l - i/C^) u{e,,), (6.24) 
lM9r.) > C2{M0f A zi(f?,(„,?)) > A < 2 A n > r2,e 

=^ ^^(^a(„,t)) > Vidn) + ii/Cs){ip,{Or^/P (6.25) 

are true on A\ Nq . 

Proof. Let C = SCi/f, C2 C^Af and = ApAP. Moreover, let e G (0, 00) 
be an arbitrary real number. Then, owing to Lemma 16.11 and 16.41 it is possible to 
construct a non-negative inter-valued random quantity T2^£ such that ti^^ < T2^e < 00 
everywhere and such that 0„ G Q, \u{9n)\ < <5, 

7-^/2(^,(0)^/' >7,r(e + £), (6.26) 
7^^/V(0>7r(^ + e) (6.27) 

on A \ TVo for n > r2,el!l Since r2,e > ti^^ on A \ iVo, Lemma [6.21 (inequality (|6.6p ) 
yields 

^(V,t)) -«(^n) < -^l|v/(^^„)f /2 + Ci (7riiv/(e„)||(^ + £) + %;"' (e + £)') 

(6.28) 

on A \ A^o for n > T2,e- As 6'„ e Q and |u(6'„)| < 5 on A \ A'^q for n > r2,e, (IQ)) (i.e., 
Assumption 12. 3p implies 

\u{e^)\<M\\vf{e^)r (6.29) 

on A \ A'o for n > T2^£. 

Let w be an arbitrary sample from A \ A^o (notice that all formulas which follow 
in the proof correspond to this sample). First, we show (|6.20p . We proceed by 
contradiction: Suppose that (|6.20p is violated for some n > T2.e- Therefore, 

<Qain,i)) - <Qn) > -f 1 1 V/(0„) f /4 (6.30) 

and at least one of the following two inequalities is true: 

|w(^n)| >C'27,;^(^e(0r, (6.31) 

l|V/(0„)f >(727„-^(¥'e(0r. (6.32) 



* To conclude that 1)6. 26p holds on A\Afo for all but finitely many n, notice that p/2 < min{r, r} < 
r when fi < 2 and that the left and right hand sides of the inequality in 116. 261 1 are equal when fi = 2. 
In order to deduce that l|6.27| l is true on A \ A^o for all but finitely many n, notice that p/jl = r, 
ipi!{^) > ^ + e when r < f and that p/ fi = f < r when r > f. 
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If (|OT|l holds, then don, fOgil imply 

\\^f{On)\\ > {\u{9^)\/My^^ > (C2/M)i/A^-p/M^^(^) > Cj-^{( + e) 

(notice that (C'2/M)^/'^ = C^/'^ > C owing to /i < 2). On the other side, if ((02l) is 
satisfied, then (I6.26P yields 

||V/(0„)|| > Cl^'j-i'/^iMOr^' > ^7„-^(e + £). 
Thus, as a result of one of (|6.31l) . ()6.32|) . we get 

||V/(0„)|| >C'7-''(^ + £)- 

Consequently, 

i\\vf{e,MVs>ici/8)j-/\\vfie,MC + s) = c,j-/\\v 

£|| V/(0„)ll V8 > iCH/8h-^^{^ + ef > C'i7-'''(^ + 
(notice that Ci/8 = Ci, C^^/S > Ci/8 = Ci). Combining this with ([OHl), we get 

HOain,i)) - < -i||V/(0„)f /4, (6.33) 

which directly contradicts (|6.30p . Hence, (|6.20p is true for n > T2,e- Then, as a result 
of (|6.29p and the fact that Bn^e Q An,e for n > 0, we get 



< («(^a(,M-)) - "(^») + iMi/C:,) ||V/(0„)ll') Ib^.. 
(«(^a(,M-)) - "(^») + i||V/(0„)f /4) /b„,. < 



< 



for n > r2,e (notice that u(6'„) > on Bn.e for each rt > 0; also notice that > AM). 
Thus, ((OT|) is true for n > T2,e. 

Now, let us prove (|6.22p . To do so, we again use contradiction: Suppose that 
(|6.2ip does not hold for some n > T2,e- Consequently, we have /t < 2, u{9^f^^ > 
and 

7>(^«) > C2(^e(e)r > 0, (6.34) 

V{0ain,i)) - ^^(^n) < (tV^S ) (^s (0 ) ""^^ ■ (6-35) 

Combining (|6.34p with (already proved) (|6.20p . we get (|6.33p . while /t < 2 implies 

2/A = 1 + l/(Ar) < 1 + 1/p (6.36) 

(notice that f — 1/(2 — /t) owing to /t < 2; also notice that p — /imin{r, f} < /if). As 
< u(6'„) < (5 < 1 (due to (|04| and the definition of r2,e), inequalities (|09ll . ([Qe]) 
yield 

||V/(0„)ll' > («(^?„)/A^)'^'' > (ti(0„))'+'/V^' (6.37) 
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(notice that M^/A < ^2 ^ue to fi < 2, M > 1). Since ||V/(6l„)|| > and < 
"(^Q(n,t)) < "(^n) (due to fOg)) . (I02D), inequalities ([OT]) give 



<M2 



Therefore, 



which directly contradicts ()6.35|) . Thus, ()6.22|) is satisfied for n > T2,£. □ 

Lemma 6.6. Suppose that Assumvtions [KT\ - \2.S\ hold. Then, there exists a 
random quantity C4 (which is a deterministic function of C) such that the following 
is true: 1 < C4 < 00 everywhere and 

||0,(„.i-) - ^nll < -Yn {ui0ain.t)) " + C'47„-^0e,. (0 (6.38) 

on A \ Nq for n > ri.g and any e G (0, cxd), s G (1, r], where 



1 + ^ + £, if r = s,r > f 
iy9e(^), otherwise 



Proof Let e G (0, oo), s G (l,r] be arbitrary real numbers, while C4 — lOCf/f 
Moreover, let uj be an arbitrary sample from A \ A^o (notice that all formulas which 
follow in the proof correspond to this sample), while n > ri.g is an arbitrary integer. 
To prove (|6.38p . we consider separately the cases ||V/((?„)|| > (4C'i/f)7~'*(/)e,s(f) and 
||V/(0„)|| < (4CiA")7-^0,,,(C). 

Case 1|V/(6'„)|| > (4C'i/£)7-'*0e,.(O- Since s < r and 0e,.(O > ^ + £, we have 

||V/(0„)ll>(4CiA")7r(e + £). 

Therefore, 

(£/4)||V/(0„)f > Ci7„-1|V/(0„)||(C + e), 
(£/4)||V/(0„)f > {4C!/i)^-'^{^ + er > C,^-'^{^ + er. 

Then, Lemma [n?2] (inequality (|6.7p ) yields 

l|v/(0„)llll^^,(„,^) -ej<-2 [uie^^^j^) - u(0„)) - £||v/(0„)||V2 

+ (7-11 V/(0„)||(e + e) + 7,;'^(e + 
<-2(w((?„(„^,-))-u(0„)). 
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Consequently, 

||0a(„,t) - 0,^\ < - 2||V/(0„)||-i (u(0,(„ - u(0„)) 

< - (2Ci/i)-l7^ {uidain,i)) - <0n)) 

Hence, (15351) is true when ||V/(e'„)|| > {ACi/i)-/-''(t)eAO- 

Case \\Vf{9n)\\ < {4Ci/i)j~'^eA0: As s < r and (t>e,siO > C + £, LemmalO 
(inequalities (16. 4p . (16.51) 1 implies 

||^a(„,f) -^^„|| <Ci (l|V/(0„)|| +7-''(e + £)) 
<(C'4/2)7-^^,^,(0, 

^(^a(„.t)) - AOn) <c, (7„-niv/(0„)ii(e + £) + T^-^'ce + 

<((74/2)7-^^(<^e,.(C))^ 

Combining this, we get 

<-ln (^(^a(„.t)) - U{e^)) i^eAOy + C^n^'PeAO- 

Thus, holds when ||V/(0„)|| < {ACi/i)-i-'(j,,AO- □ 

Lemma 6.7. Suppose that Assumptions [2J\ - \2.3\ hold. Then, 

u{eA > -C2l~^iM0 f (6.39) 

on A \ Nq for n > T2,e and any e € (0,oo). Furthermore, there exists a random 
quantity G [l,c)o) (which is a deterministic function of p, C, M) such that the 
following is true: 1 < C5 < oo everywhere and 

||V/(0„)f < (75 {^{u{eA)+ln^{Ve{Of) (6.40) 

on A \ TVq for n > r2.£ and any e G (0, 00), where function ip{-) is defined by 4>{x) = 

Proof. Let C5 — AC2/t, while e G (0, cx)) is an arbitrary real number. Moreover, 
uj is an arbitrary sample from A \ A^o (notice that all formulas which follow in the 
proof correspond to this sample). 

First, we prove (|6.39[) . To do so, we use contradiction: Assume that (|6.39p is not 
satisfied for some n > T2,e. Define {nk}k>o recursively by rto = and Uk — a{nk-i,t) 
for fc > 1. Let us show by induction that {u{dnk)}k>o is non-increasing: Suppose that 
u{9ni) — "(^n(-i) for < Z < fc. Consequently, 

(notice that {7„}n>o is increasing). Then, Lemma [6751 (relations (I6.20p . (|6.23p ) yields 

uiOn^^J - «(0„J < -<||V/(0„Jf /4 < 0, 
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i.e., u{9nk+i) < ""(^rifc)- Thus, {u{9.n^)}k>o is non-increasing. Therefore, 

Hmsupu(6'„J < u{9no) < 0- 

n — 'oo 

However, this is not possible, as hni„_>oo u{9n) — (due to Lemma l6^ . Hence, ()6.39p 
indeed holds for n > T2^e- 

Now, (|6.40p is demonstrated. Again, we proceed by contradiction: Suppose that 
(|6.40p is violated for some n > T2,e. Consequently, 

(notice that C5 > C2), which, together with Lemma [6.51 (relations (|6.20|) . (|6.23p ). 
yields 

«(W-))-^(^«)^^-^"IIW(0»)llV4. 

Then, ([QQ]) impHes 

||V/((?„)f <(4/£)(7i((?„)-u(0,(„,,-))) 

<{4/i) (^(u(0„)) + C2i;i'^^i^{'Pe{or) 

However, this directly contradicts our assumption that n violates (|6.40p . Thus, ()6.40p 
is indeed satisfied for n> T2^£. □ 

Lemma 6.8. Suppose that Assumptions [KT\ - \2.3\ hold. Then, there exists a 
random quantity Cg ( which is a deterministic function of p, C, M) such that the 
following is true: 1 < Cq < 00 everywhere and 

liminf u{9n) < Cei^Or (6.41) 

n — *oo 

on A \ Nq for any e £ (0, 00). 

Proof Let Ce = (72 + Cf . We prove by contradiction: Assume that (jOTI) 

is violated for some sample u from A \ A'o (notice that the formulas which follow in 
the proof correspond to this sample) and some real number e G (0, 00). Consequently, 
there exists no > T2^e (depending on lo, e) such that 

u(9.n) > Cej-Pi^eiO f (6.42) 

for n > hq. Let {nk}k>o be defined recursively by Uk — a{jik-i,t) for fc > 1. In what 
follows in the proof, we consider separately the cases fi < 2 and // = 2. 
Case fi < 2: Due to (|6.42p . we have 

v{9r.J<C-'/%,AM0)-^^^- 

On the other side. Lemma [631 (relations ([OS]) . ([QS]) ) and (fOSj) yield 

v{9^,^,) - v{9^J > {i/C,){^M))-^'^ > (l/4)(7«.+, -7nJ(v'e(0)^''/^ 
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for fc > (notice that i > ■'jn^+i — Ink)- Therefore, 



k-l 



(1/C3)(7„. -7no)(^.(0)-''/^ <E(«(^».+i) -"(^".)) 

i=0 

for fc > 1. Thus, 

(l-7no/7nJ <C3(76-^/^ 

for fc > 1. However, this is impossible, since the hmit process fc — > oo (appHed to the 
previous relation) yields > (notice that Cg > Cf). Hence, (|6.4ip holds when 
/I < 2. 

Case ft ^ 2: As a result of Lemma 1631 (relations (|6?2T|l . ([OI]) ) and ([02| . we 

get 

W(0n.+ J < (1 - i/CsMOn,) < (l - (7n. + , " InJ/Cs) J.(0„J 

for fc > 0. Consequently, 



1=1 

<u(0„Jexp (^-(1/C3)^(7«. -7n>-J^ 
=u(6i„Jexp (-(7„, - 7«o)/C'3) 
for fc > 0. Then, (jO^ yields 

C'6('^e(e))^ < ui9Mk exp (-(7n. - ^.o)/^) 

for fc > 0. However, this is not possible, as the limit process fc — > cxo (applied to the 
previous relation) implies CeiffieiS,))^ < 0. Thus, (|6.4ip holds also when /i = 2. □ 

Lemma 6.9. Suppose that Assumptions \2.1\ - 1^.31 hold. Then, there exists a 
random quantity C7 ( which is a deterministic Junction of p, C , M) such that the 
following is true: \ < < 00 everywhere and 

limsup7^ u{e,,) < Cji^eiOr (6.43) 

n — >oo 

on A \ Nq for any e G (0, 00). 

Proof Let Ci = SCiCs, C2 = eCiCa + + Cg and Ct = 2{Ci + ^2)2. We 
use contradiction to show 1)6.43^ : Suppose that ()6.43p is violated for some sample uj 
from A \ A^o (notice that the formulas which appear in the proof correspond to this 
sample) and some real number e S (0, 00). Then, it can be deduced from Lemma 
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that there exist Uq > niQ > T2^e (depending on e) such that 

7^.o"(^™o)<(?2(^e(Or, (6.44) 
7t"(^"o) > C,(^MY. (6.45) 
min ^iu[0n)>C2{Ve{Of. (6.46) 

mo <n<no 

max ^iu{e^)<Cr{^,{£,)f (6.47) 

rriQ <.n<no 

(notice that C2> Cq) and such that 

(7a(™o,t)/7™or < min{2, (1 - t/C^)-'}, (6.48) 

7;;;f(e+e)'<7™f(¥'e(or (6.49) 

(to see that (j6.48l) holds for aU, but finitely many toq: notice that lim„^oo la{n i)Hn — 
1; to conclude that (j6.49p is true for all, but finitely many mp, notice that p < 
2 min{r, f} < 2r if /i < 2 and that the left and right-hand sides of (|6.49p are equal 
when // — 2). 

Let Iq ~ a{mo,i). As a direct consequence of Lemmas 16. 2[ 16.71 (relations ()6.5p . 
([OOp and (|09l) . we get 

u{e„) - u{9„J <C, (7™;i|V/(0™J||(e + £) + 7™o''(e + 
<Cl(||V/(^^,„J||V2 + 37„r(e + e)V2) 
<(7i(75 iA<0„,o)) + (2(7i + CiC5)7™!^(¥'s(e))'' 
<(7i (V'(^i(^^™o)) + 7™f(¥'s(e))'') (6.50) 
for mo < n < /o. Then, (jOe)) . (jOSl) . ((630| yield 

w(^™o) + C'l'^(^i(^™o)) >"(^™o + l) - C-n^f (¥'s(0)'' 

>(C'27™f+i-Ci74)(^aO)^ 

= (c'2(7™„+i/7™o)"'^ - 7™f (¥'.(0)'' 

>(C2/2-(7i)7-f(¥'e(C))'' >0 (6.51) 

(notic e that ( 7mo+i /7mo)^ < [liahmaY < 2; also notice that (72/2 > SCi), while 
(EH, (1^3(111 imply 

«(^n) <(1 + Cl)u{e„J + Ci7-f (^e(e))'' 

<(C'r/2)(7„/7™o)^."^(^e(0)'' 

<C77„'^(^s(e))'' (6.52) 

for rriQ < n < In (notice that {'^n/lmnY < {HahmoY < 2 fo r mp < n < I p; als o 
notice that 67/2 = (Ci + C2Y > Ci + C2 + C1C2). Due to (lOSl) . (EUl), (j632)l . 
we have /q < '^o. On the other side, since x + Ciip{x) > only if a; > and since 
X + Citp{x) = (1 + Ci)x for X > 0, inequahty (|6.5ip implies 

^(^™o) >(1 + Ci)-i((72/2 - C^h-J;{ip,{Or > C2i;ri:M{£.)f (6-53) 

27 



(notice that (72/2 -Ci> Ci{iC2 - 1) > 2C1C2 > (1 + ^OCa). 

In what foUows in the proof, we consider separately the cases /t < 2 and (1 — 2. 
Case fi<2: Owing to Lemma (relations (p?^ . (lOSI ) and ([Oil) . we 

have 

v{ei„)>v{e„,„) + {ilC:i){^,{s,))-^/^ 

(notice that i > "fi„ — ^ma\ also notice C2 < C^^). Consequently, 

However, this directly contradicts (|6.46p and the fact that Iq < uq. Thus, (|6.43p holds 
when jl < 2. 

Case fi = 2: Using Lemma l675l (relations (j6.2ip . (|6.24p ) and (|6.53p . we get 

u{0i„) < {l-i/C3)u{9^„). 

Then, ([OSl yield 

However, this is impossible due to (|6.46p and the fact that Iq < uq. Hence, (|6.43p also 
in the case fi — 2. D 

Lemma 6.10. Suppose that Assumptions [2A\ - 1^.31 hold. Then, for any real 
numbers e G (0, 00), s € (l,?"] H (l,p), there exist a random quantity Bg (which is a 
deterministic function of s, p, C, M and does not depend on e) and a non-negative 
integer-valued quantity cr^ such that the following is true: 1 < Bs < 00, < a^^g < 00 
everywhere and 

sup - 9 J < Bgj-P+%^,iO fic^eAOr' + 47,;'+V..s(0 (6.54) 

k>n 

on A\No for n > a^^s ■ 

Proof. Let e G (0, 00), s G (1, r] n (l,p) be arbitrary real numbers, while Ci = 
2{C2 + C7), C2 = 2C1C5, C3 = 2C1C2. Moreover, let 

Bs = (3/i){Ci + (74)(1 + s/{p -s) + l/{s - 1)) 

and Bs - Bs + 2C3. 

It is straightforward to show ^a(n,i) — In t + 0(aQ(„^t)) and 

7a(„,t) - In =lain,i) " " ^Mn.i) " 7n)/7a(„,t)) ) 

==7^ ( sh^^ + 0(7~,^ ,j) (6.55) 

ia{n,t) \ 'a(n,t) ^ 'a{n,t)'J ^ ' 

for n 00. On the other side. Lemmas 16. 71 and 16.91 implv 

limsup7P|M(0„)| < max{C2,C'7}(^s(e))'' (6.56) 

n—^OQ 
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on A \ iVg , while Lemma 16.71 and (|6.56p yield 

limsup7^||V/((?„)f <C5\imsup^Pi^{u{en)) + C5{M0r 

n— >oo n— i-oo 

<2(75max{C2,(77}(^aO)'' (6-57) 

on the same event. Then, owing to (|6.55p - (|6.57p . it is possible to construct a non- 
negative integer-valued random quantity ae,s such that < ae^s < oo everywhere 
and such that 

7a(„,t) -ln> i/2, (6.58) 
7^ - In < sY7\., (6-59) 

'a(n,t) — ia(n,t)^ ^ ' 

\u{^n)\<CxlV^^We{i)r. (6.60) 
||V/((?„)|| < C27,7^+^(¥'s(O)''(0e,.(O)'' + C27,;^+Ve,.(e) (6.61) 

on A \ iVo for n > ae,M 

Let Lo is an arbitrary sample from A \ A^o (notice that all formulas which follow 
in the proof correspond to this sample). Moreover, let {nk}k>o be recursively defined 
by riQ — (Te s -|- 1 and Uk+i — a{nk,t) for fc > 0. Then, due to Lemma 16.61 we have 

i-i 

i—k 

l-l l-l 

i=k i—k 
I l-l 

i—k--\-l i—k 
+ HOn. ) I (0,,, (0)-l + 7^, HOn, ) I {C^e,s (0)"' 

for < fc < /. Consequently, ([6?59ll . ([HJOll yield 

||^?„, -0«J| <Cls{^,{^)Y{cj^eAOr' E 7,7/+'^-'+C4</>e,.(0E^".' 

-i— A;+l i— A; 

+ ^i(7„7+^ + 7„7+^)(^e(C))''(0e,.(e))-' (6.62) 

for < fc < /. Since 

l-l 

7n, = In^ + E'^^'^'+i ~ - ^"-^ + " 

i—k 

for < /c < / (owing to (|6.58p ). we get 



E7n.'-^<E(7«^+*^V2)-^-^ 



i=k 







^ To deduce that (16.611 1 holds on A \ Nq for all but finitely many n, notice that ||V/(6n) 
0(7„ ''''^) on A \ A^o S'lid that at least one of p/2 > p — s, j3/2 > s — 1 is true. 
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for A; > and A G (0, oo). Then, (pIB^ implies 

+ 3C4f-i(s-i)-i7-;+Ve,.(C) 

<B,^-f+^{^,{^)f{<j,,,,{i)r^ + B,7„7+Ve,.(e) (6.63) 

for < fc < /. On the other side, since s — 1 < r and tfi^^siO > ? + £, Lemma [6.21 
(inequahty (16.4^ 1 and (|6.6ip yield 

w-OnW <ci(iiv/(0„)ii+Tr(e+£)) 
<Ci(iiv/(^?„)ii+7r+Vs,.(e)) 

for cTg^s < n < fc < a{n,t) (notice that a^^s ^ '''i.e). Combining this with (j6.63p . we 
obtain 

ll^fe - OJ <\\ek - 0n,\\ + \\0n, - 0nA\ + Pn, " 

for CTe.s <'n'<k, l<i<j satisfying < n < rii, rij < k < n^+i. Then, it is 

obvious that (|6.54p is true. □ 

Lemma 6.11. Suppose that Assumvtions [KT\ ~ \2.3\ hold. Then, there exists a 
random quantity Cs (which is a deterministic function of r, fi, C, M) such that the 
following is true: 1 < Cg < oo everywhere and such that 

limsup7« sup pk -e^\\< 4^,(0 (6.64) 

n — ^oo fc>n 

on A \ AT) for any e G (0, oo). 

Proof. Let s — min{r, f} and Cg = 25^, while e £ (0, oo) is an arbitrary real 
number. Moreover, let w be an arbitrary sample from A\ A"o (notice that all formulas 
which follow in the proof correspond to this sample). In order to show (|6.64p . we 
consider separately the cases r < r and r > f. 

Case r < f: We have s~r, q~r~l and p = fir = r{2 — 1/r). Consequently, 
p — r^r — r/f>0,p — 2r + 1 = 1 — r/f>0 (notice that r/f < 1 < r), i.e., s < p, 
q < p — -s. Then, Lemma 16.101 implies 

limsup7| sup ll^fc - OnW < Bs(l>eAO < Cs^eiO 
n— >oo k>n 

(notice that lim„^oo 7^'^+''+^ = and (pe,s{0 = fe{0)- Thus, ([6J4| holds when 
r < r. 

Case r > f: We have s = f,q = f — 1 and j) = fir = 2r — I. Therefore, q = p — s 
and s = (p+l)/2<p (notice that p > 1). Then, Lemma [6 . 1 01 vields 

limsup7|sup||0fc-0„l| <BsiMO)A<t>eAOr' + Bs4>eAi) 

n-^oo k>7i 

<Bs{^e.AOf-' + Bs^.iO 
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(notice that 4>e.si£,) = y^siO^ since both r — s, r > f cannot hold if r > f; also notice 
that /i — 1 < 1 and that (pe{0 > 1 if r > f). Hence, (|6.64p is true when r > f. D 

Proof of Theorems 12.11 and 12.21 Owing to Lemmas 1131 and 16.101 9 = 
lim„^oo 6n exists and satisfies Vf{d) = on A \ Nq. Thus, Theorem 12.21 holds. In 
addition, we have Q C{e eR'^" : p - 9\\ < 6g} on A \ iVo {Se is specified in Remark 
12. ip . Therefore, on A \ A^o; random quantities /t, p, f defined in this section coincide 
with /t, p, f specified in Theorem 12. 21 fsee Remark l2.ip . Similarly, C, M introduced in 
this section are identical to C^, Mg (specified in Section[2) on A\ A^'q. Thus, Theorem 
12. H is true. 

Let K = 2(75 (Cs + Cj) + Cg. Then, Lemmas 16.51 16.81 and the limit process £ ^ 
imply 

limsup7^k(0„)| < iC2 + C7)M0f < KMOf 

n — >oo 

on A \ TVo- Consequently, Lemma [631 yields 

limsup7^||V/(0„)f <C^{^{Of +C^\\^snviii^{u{e^)) < Ki^f 

on A \ A^o- On the other side, using Lemma [6. Ill we get 

limsup7,tl|^?„ - ^11 < CMO < K^{i) 

n^oc 

on A \ TVq. Hence, Theorem!^ holds, too. □ 

Remark 6.4. Owing to Lemma \6.1(A {0„}ri>o converges to 9 on A \ Nq at the 
rate O ^^^^ where s G {l,r] n (l,p). It is straightforward to show 

q — max min{p — s, s — 1}. 

se(i,r]n(i,p) 

This suggests that 0(7,7'^) tightest bound on the convergence rate of {9n}n>o 

which can be obtained by the arguments Lemmas \6.(A FOOl and V6.11\ are based on. 

7. Proof of Theorem 13.11 The following notation is used in this section. For 
9 G M'*^ z e M'^^ Ee,z{-) denotes E{-\9o = 6*, = z). Moreover, let 

C„ = F(0„,Z„+i)-V/(^„), 

= F{On, Zn+l) - {UF){9n, Z„), 
6,n - iUF){9n,Zn) ~ (HF) (0„_ i , Z„) , 
Cs.Ti — ~{n-F){9n, Zn+l) 



for n > 1. Then, it is obvious that algorithm (|3.1|) admits the form (|2.1[) . while 
Assumption 13.21 yields 

i—n i—n i—n i—n 

- afc+i7fe+i6,fc + an7^6,n-i (7-1) 

for 1 < n < fc. 

Lemma 7.1. Let Assumption \3. 1\ hold. Then, there exists a real number s G (0, 1) 
such that X^^^o ^n^^'ln < 
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Proof. Let p = (2 + 2r)/(2 + r), g = (2 + 2r)/r, s = (2 + r)/(2 + 2r). Then, using 
the Holder inequahty, we get 



l/P / oo \ 1/9 



OO oo / \ 1/q / oo \ /P / oo 

n=0 n=l \ /n/ \„^;^ / \„^i In 



Since 7„+i/7„ = 1 + q:„/7„ = 0(1) for n ^ oo and 



7„+l -7n / / 7n+A^ Z"'""^' 1 / 7n+l^ ^ 



EOn _ \ - 7«+l ~ 7n ^ \ - 
^2 ^^2 .Zl-/ 



< — max 



it is obvious that X^^o '^n''^''7r! converges. □ 

Proof of Theorem 13.11 Let Q c K'^" be an arbitrary compact set, while 
s e (0, 1) is a real number such that J2'^=o^n^^^n < Obviously, it is sufficient to 
show that X]r=o ""7n?n converges w.p.l on flj^oi^" ^ Q}- 
Due to Assumption 13.11 we have 

a^_ian7n = (1 + an-i(a,T^ - "n-i))'' a«"^''7^ = 0(a^+''7;;), 

(a„_i - a„)7,'; = (a-^ - a;^\) (l + a„_i(a,7^ - a«7r'; = 0(a^,7;), 

an(7n+l - 7n) = Otnln ((1 + OL^IlnY ^ 1) = an7^ {ran/in + o(a„/7„)) = 0(0^7^) 

as n — > 00. Consequently, 

00 

E <an+l7r';+l < C^-^) 
n=0 

00 00 00 

E " an+i7,'i+il < E""l^" + E l"» -"»+il7^+i < oo- (7-3) 

Ti— n— n— 

On the other side, as a result of Assumption I3.3[ we get 

^e.. (116 ."II -^{TQ>n} 
(116 ,n|| I{TQ>n 

for aU 6* e ]R''», z e M''-', 71 > 1. Then, Assumption [3T] and yield 

^e.^ E""^"'^ll^i'"ll^^{^Q>"} - ^ E""^"'^ supSe,^ ((pQ_^(Z„)/{rQ>„}) < 00, 
\„=i / \„=i / "2^0 

Eg,z ( E ll^2,n||/{rQ>n} ) < [ E"»-l""^" ) ™P-^e,^ {ipQ^s{Zn)I{rQ>n}) < OO 
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for any 9 eM.'^", z e while (|7?5)) implies 

Ee,z |a«7« - an+i7rVilllC3,nII-f{TQ>«} j 



n>0 



,n|| ^{rQ>n} 1 



n>0 

l'^-. Since 



for each 6 eR"^", z e 

Ee,z {il.nI{TQ>n}\^n) = (-5^61,2 , Z„+i ) |^„^ — (nF)(0„,Z„)^ '^{TQ>n} — 

w.p.l for every 6 G K.'''', z G M''-, n > 1, it can be deduced easily that series 

oo oo oo 

n— 1 n—1 n—1 

converge w.p.l on Pl^oi^" ^ Qi' well as that lim„_^oo ctnJn^3,n-i = w.p.l on 
the same event. Owing to this and (|7.ip . we have that X]^o '^"Tn'fn converges w.p.l 

8. Proof of Theorems 14.11 and 14. 2i In this section, we use the following 

notation. For 9 eR'^", x e M^, y G K and z = [x'^ y]"^ , let 

F{9,z) = -{y-Ggix))Hg{x), 

while Zn+i ~ [X'^ Yji]'^ for n > 0. With this notation, it is obvious that algorithm 
(|iTT|l admits the form (PTTjl . 

Proof of Theorem 14.11 Let 9 = [ai - ■ ■ aM • ■ ■ &j\/,Ar]'^ G M''" , while 



2KLMN{1+ \\9\\) 



and Uq = {rj ^ C^" : ||77 — 9\\ < 6g} (e is specified in Assumption 14. ip . Moreover, for 
= [ci • • • CM di.i • • • dnM'^ G C'*^ a: = [xi • • • xw]^ e M^, let 



M 



AT 



/W-^ j{y^G^{x))\{dx,dy). 



Then, we have 



AT 



N 



N 
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for all T] — [ci - ■ ■ Cm • ■ ■ rfA/,Ar]"^ £ Ue, I < i < M and each x = [xi • ■ ■ xj^]'^ e 
satisfying ||x|| < L. Consequently, Assumption 14. f I implies 



M 



N 



N 



i=i 



i=l 
M 

i=l 





M 


'Z' ^E ^'J^i j 










< SeKM + kJ2\ 



N N 



< JeifM + (5eis:LM7V||6i|| < e 

for any 77 = [ci • • • ca/ • • • gJm.at]"'" G and each a; = [xi • • • xjv]"^ G satisfying 
llxjl < L. Then, it can be deduced that for all x G satisfying < L, Gri{x) is 
analytical in rj on Ug. On the other side, Assumption 14.11 yields 




< Xi||?7|| 



for all 77 = [ci • • ■ CAf di^i ■ ■ ■ dM,N\^ ^ Ug, 1 < k < AI, I < I < N and each 
X = [xi- ■ ■xn]'^ G satisfying ||x|| < L. Therefore, 

||V^G,(a:)|| <KLMN{l+\\7j\\) 

for any rj E Ug and each x G satisfying ||a;|| < L. Thus, 

||V,(j/ ~ G,(x))2|| ^ 2|y - G,(x)|||V,G,(a;)|| < 2K^LHl^Nil + \\rj\\)' 

for all f] E Ug and each x G M^, y G M satisfying ||x|| < L, \y\ < L. Then, the 
dominated convergence theorem and Assumption 14.21 imply that /(•) is differentiable 



we 



on Ug. Consequently, /(•) is analytical on Ug. Since f{9) = f(9) for all 6 G 
conclude that /(•) is real-analytic on entire W^" . □ 

Proof of Theorem 14.21 As {Z„}n>o can be interpreted as a Markoy chain 
whose transition kernel does not depend on {9n}n>o, it is straightforward to show 
that Assumptions 13.21 and 13.31 hold. The theorem's assertion then follows directly 
from Theorem 13. II □ 



9. Proof of Theorems 15.11 and 15.21 In this section, we use the following 
notation. For n > 0, let 
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while d^^ L + {M + N){N + 1). For e Q, let ef, ^ ■ ■ ■ ^ e^N+i = 0, Vo 
V'-AT+i ^ Oi while {£^}„>o, {ipn}n>o are defined by the following recursion: 



9n-l — l-fn-l • • • -Tn-Af £„_! ' ' ' ^n-Nl ) 

zi = [xl y„ • • • y„_M+i £^ (V'')^ • • • £^iv+i (V'^-iv+i)^]'^, « > i- 

Then, it is straightforward to verify that {efj}n>o satisfies the recursion (|5.2I) . as well 
as that tp^ = Vee^ for n > 0. Moreover, it can be deduced easily that there exist a 
matrix valued function Ge : 9 ^ ^d^xd^ matrix H G W^^'^^ with the following 

properties: 

(i) Gg is linear in 6 and its eigenvalues lie in {z £ C : |z| < 1} for each 6' £ 9. 

(ii) Equations 

Zn+l — GgZ^ + HVm Zn+l — Gg^Zn + HVn 

hold for all 6* £ 9, n > 0. 

The following notation is also used in this section. For £ 9, a; £ R^, yi, . . . , um £ 
R, ei, . . . , ejv £ K, /i, . . . , /at £ R''^ , and z = [a;^ yi • • • ei /f • • • bat /^]^, let 

F(0,z) = /iei, m=el 

while 

ne(z,B) = £;(/B(Gez + i/K))) 

for a Borel-measurable set B from R''^ . Then, it can be deduced easily that recursion 
(|5.3p - (|5.6p admits the form of the algorithm considered in Section [S] Furthermore, 
it can be shown that 

{Ii-^){e,Q)^E{{ey), (9.1) 

0) = i?(^:£:) = v,(n»(^, o) (9.2) 

for each 6* £ 9, n > 0. 

Proof of Theorem 15.11 Let m = £'(lo) and = r_fe = Cov(Fo,yfe) for 
fc > 0, while 



iujk 
k— — oo 



for uj £ [— TT, tt]. Moreover, for £ 9, z £ C, let Cg{z) = Ag{z) / Bg{z), while 

a9 = l+ max |A9(e^")|, f3g ^ min |Be(e'")|, 5, = -^. 

we[-7r,7r] tje[-7r,Tr] idgag 

Obviously, 1 < ae < oo, < (3g,6g < oo (notice that the zeros of Bg(-) are outside 
{z € C : |z| < 1}). 
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As J2T=o''^k < oOj lv(')l is uniformly bounded. Consequently, the spectral theory 
for stationary processes (see e.g. [3 Chapter 2]) yields 

lim E{ei) = Ce{l)Tn, 



hm Cov(4,4+fe) = ^ / |C,(e-)|V(^)e^'^'^rf'^ 

for all 9 e Q, k > (notice that = Ce{q)Yn and the poles of Cg{-) are in {z G 
|z| > 1}). Therefore, 



fiO) 



in 



\Cg{e'^)\^y,iu;)duj + \Ce{lW 



(9.3) 



for any 6* G 6. On the other side, it is straightforward to verify 



oak 



— iiok 



92 



dakida^ 
Qh^ — hi 



1 



\ h+h^ hijv + 1 



for every 6* = [ai • • • om 61 • • • b^]'^ G 8, a; G [— tt, tt], 1 < fc, /ci, fc2 < M, /i, . . . , Zjv > 0. 
Thus, 



^felH hfeM- 



■ ■ ■ da''^' db[' ■ ■ ■ db^ff 



Qki^ hfcM 



C(,(e^") 



^ ^^e(e" 

da\' ■ ■ ■ da\Y 



for all 6 = [ai • • • um bi ■ ■ ■ &Ar]"^ G 8, a; G [—tt, tt], ki, . . . , fcjv/ > 0, li, . . . ,1^ > 0. 
Then, it can be deduced easily 



d' 



ki-i hfcdfl 



(9t9^ • • ■ dr^"" 



for all ^ G B, c<j G [— TT, tt], fci, . . . , > (i^^ denotes the i-th component of 6). Since 

ki ^dfi 



d 



feiH hfcd 



• • • 



Q{ki-ji)A \-{kdg-3dg) 
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for each G 8, w G [— tt, tt], ki, . . . , k^^ > 0, we have 



<(fci + --- + fcdj!( ^ 



ae 



fci 



( 2ae 

V /3e 



V ••• V — 

^'fcl + ■■■fcde^ 
ji=0 Odg=0 \ji + ---jdg' 

ki + --- + ka„+2 fci ''dg . 

ki + ---+ka„+2 



for any G 0, w S [— 7r,7r], ki, . . . , fc^e > 0. Consequently, the multinomial formula 
(see [m Theorem 1.3.1]) implies 



Oo 



hkd„ 



E--E 

fci=0 kag=0 

2 oo 



ki\---kds\ 



Qki + ---+kde . 

5^^^ • • • d-dif 



< 



E--E 

fci=0 fcdg=0 
2 oo 

E E 

n=0 0<fci,...,fcde<« 
ki + ---kdg =n 

2 oo 



(fci + --- + fcrfj! r 2ae6g 
ki\---kdg\ \ I3e 



ki-i ykda 



ki\---kdg\ V 



E 



n=0 
2 oo 



n=0 



< OO 



for every 61 G 0, w e [— tt, tt]. Then, the analyticity of /(•) directly follows from (|9.3p 
and the fact that \^p{-)\ is uniformly bounded (also notice that Ce(l) is analytic in 0). 
□ 

Proof of Theorem 15.21 It is straightforward to show 
max{||i^(0,z)|U(z)}< ||z||, 

max{||^^(^?, z') F{6, z")\\, |</)(z') - 0(z")|} < 2||z' - z"||(||z'|| + ||z"||) 

for all 6* G 9, z,z',z" G W^' . Moreover, it can be deduced easily that for any 
compact set Q C W^" , there exist real numbers <5i q G (0, 1), Ciq G [1, oo) such that 
won < Ci^qSIq and 

\\Ge'-Ge,,\\ < Ci,q\\9' - e"\\ 

for each 0,6', 9" E Q, n> 0. Then, the results of [1, Section II. 2. 3] imply that there 
exist a locally Lipschitz continuous function g : Q ^ R"^" and a Borel-measurable 
function F : O x W^- ^ R'^" such that 



F{e,z)~gi0)^F{0,z)-{nF){0,z) 
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for every G 0, z G R^-. Due to the same resuhs, there exists a focally Lipschitz 
continuous function h : Q ^ M. and for any compact set Q C M'^", there exist real 
numbers S2,q G (0, 1), C2,q G [1,oo) such that 

max{||(n"F)(0,z)-.g(0)||,|(n»(0,z) < C2,qSIq{1 + \\z\\f , (9.4) 

max{||F(0,z)|U|(nF)(0,z)||} < C2,q(1 + ||^||)^ 
\\F{0\z)^Fi9",z)\\<C2,Q\\9'-e"\\{l + \\z\\r 

for each 0, 9', 6" e Q, z, z', z" e R''- . Combining (HH), with the dominated 

convergence theorem, we get h{-) = /(■), g{-) = V/(-). On the other side, owing to 
the fact that {Xn}n>o is a geometricahy ergodic Markov chain, we have that {y„}„>o 
admits a stationary regime for n — > oo. Consequently, Theorem 1 5 . 1 1 implies that /(•) 
is analytic on O. Then, the theorem's assertion directly follows from Theorem 13.11 
□ 



Appendix. In this section, we prove the claim stated in Remark 12.21 If open 
set V specified in Remark 12.21 exists, we can define the following quantities for any 
compact set Q C M.'^" and any a G f{Q)- 

( Sq^^, if g n 5 ^ 0, a e f{S) 

SQ,a = < 1, if Q n 5 = 

lmin{l,d(a,/(5))/2}, if a ^ 



2, otherwise 



Mq,. = 1 + sup I ■.9eQ\S, \fi6) -a\< ^q^, 

where Q ^ Q if Q <Z V and Q ^ {9 e Q : d{9, S) < d{Q \ V, S')/2} otherwise. Then, 
it is straightforward to show 

a^/(5)=^inf{||V/(e)|| :0eQ,|/(0)-a| <^Q^4>0, 
Q \ 1/ ^ =^inf{||V/(e)|| : e Q \ Q} > 0, 

Q n 5 ^ =^sup l^^l^ : e Q, 1/(0) - a| < Jq,,| < AfQ^^ < oo. 

Consequently, (Jg^a, AQ,a, ^Q,a are well-defined and enjoy the following properties: 
< Iq.a < 1, 1 < ii.Q,a < 2, 1 < Mq.a < OO and 

|/(0)-a| <MQ,J|V/(0)f«- 
for all 6* £ Q satisfying 1/(6*) — a| < Sq.a- Hence, the claim holds. 
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